Towards a Parser for Mathematical Formula Recognition
نویسندگان
چکیده
For the transfer of mathematical knowledge from paper to electronic form, the reliable automatic analysis and understanding of mathematical texts is crucial. A robust system for this task needs to combine low level character recognition with higher level structural analysis of mathematical formulas. We present progress towards this goal by extending a database-driven optical character recognition system for mathematics with two high level analysis features. One extends and enhances the traditional approach of projection profile cutting. The second aims at integrating the recognition process with graph grammar rewriting by giving support to the interactive construction and validation of grammar rules. Both approaches can be successfully employed to enhance the capabilities of our system to recognise and reconstruct compound mathematical expressions.
منابع مشابه
Table recognition in mathematical documents
While a number of techniques have been developed for table recognition in ordinary text documents, when dealing with tables in mathematical documents these techniques are often ineffective as tables containing mathematical structures can differ quite significantly from ordinary text tables. In fact, it is even difficult to clearly distinguish table recognition in mathematics from layout analysi...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملA Stochastic Finite-State Morphological Parser for Turkish
This paper presents the first stochastic finite-state morphological parser for Turkish. The non-probabilistic parser is a standard finite-state transducer implementation of two-level morphology formalism. A disambiguated text corpus of 200 million words is used to stochastize the morphotactics transducer, then it is composed with the morphophonemics transducer to get a stochastic morphological ...
متن کاملAn e$cient syntactic approach to structural analysis of on-line handwritten mathematical expressions
Machine recognition of mathematical expressions is not trivial even when all the individual characters and symbols in an expression can be recognized correctly. In this paper, we propose to use de"nite clause grammar (DCG) as a formalism to de"ne a set of replacement rules for parsing mathematical expressions. With DCG, we are not only able to de"ne the replacement rules concisely, but their de...
متن کاملA New Top-Down Context-Free Parsing for Syntactic Pattern Recognition
The numerous different mathematical methods used to solve pattern recognition snags may be assembled into two universal approaches:the decision-theoretic approach and the syntactic (structural) approach. In this paper,at first syntactic pattern recognition method and formal grammars are described and then has been investigated one of the techniques in syntactic pattern recognition called top –d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006